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SUMMARY 

We often use empirical models in automaton applications. 
These may relate process output to inputs at steady state, 
describe the interrelation between easily measured variables 
to predict a less easily measured process characteristic (infer- 
ential or soft sensors), describe the dynamic process response 
to manipulated variables, or predict the impact of processing 
conditions on product attributes. Each of these models would 
predict an output value that is continuous. As well, we use 
models to classify conditions or faults that produce discrete 
or logical (0, 1) outputs. These may include classification of 
faults from process data, or classification of flow regime (tur- 
bulent, laminar). Models are usually based on standard statis- 
tical regression equations of the form 

y = a + bx + cx 2 + clx 3 H — ( 13 . 1 ) 

Neural networks (NNs) can do the same modeling job, 
and can have advantages in flexibility and efficiency. In a 
model such as that of Equation 13.1, the user has to explicitly 
choose the functional relation between y and x. In this case, 
a cubic relation is indicated. However, it could be reciprocal, 
logarithmic, or seasonal; and a user has to specify the right 
one. By contrast, in NNs the user does not have to specify the 
functional relations, and this convenience is often called the 
model-free attribute. 

NNs have demonstrated usefulness in automation, medi- 
cal and fraud diagnosis, and pattern recognition; and many 
businesses have been structured to provide both NN software 
and services. Within the process industry, NNs have repeat- 
edly demonstrated success in process industry on steady-state 
modeling for optimization applications, instrument calibra- 
tion, soft or inferential sensors (such as emissions moni- 
toring), dynamic modeling in nonlinear model-predictive 
control, and condition monitoring and fault diagnosis. 
And applications have been demonstrated in nearly every 


sector of the process control systems, automation, and other 
industries [1-9], 

Desirably, we use best knowledge for process control and 
optimization, and a preference might be to use rigorous, first- 
principle models. However, rigorous models, for example, 
of distillation processes normally have iterative root-finding 
procedures for several variables at each stage, and the num- 
ber of iterations to obtain convergence depends on operat- 
ing conditions. Further, under some conditions, convergence 
may not be possible. The possibility of not converging and 
the uncertainty in computational time to convergence pre- 
vent such best knowledge from being used in automation. 
However, off-line, the rigorous model could generate data 
for an NN to model. The NN computes rapidly, deterministi- 
cally, and without internal convergence conditions, making 
the NN model applicable for automation. 

In spite of successes, NNs are an underutilized technol- 
ogy in the author’s opinion. There are three reasons that 
seem to be the cause of underutilization: First is the name 
neural network. Concepts for NNs were proposed prior to 
the middle of the last century, but their major development 
and popularity happened in the 1970s when computer power 
made research practicable. NNs were originally developed 
by educational psychologists and computer scientists in their 
attempt to develop computer programs that would process 
data in the way human neurons do. The concept was a pre- 
lude to developing computers that could “learn” as humans 
do; however, there is no independent cognitive “thinking” by 
NNs. But, the name neural network and associated educa- 
tional jargon and anthropomorphic allusions preserve that 
legacy, which exacerbates by a wariness that conservative 
managers might have about the impact of unknowable con- 
clusions that an independently thinking computer might want 
to implement. This is not true, of course. There is no thinking 
by an unknowable computer persona. NNs comprise line-by- 
line, deterministic, transparent, analyzable equations. 

Second, engineers attempted to use NNs in the 1980s, 
when first commercial products were somewhat primitive, 
and novice expectations somewhat high. Many could not get 
their applications to work, which, of course, was the fault of 
the NN. They remember “it doesn’t work,” which remains a 
barrier to accepting it today. 

Finally, as a barrier to acceptance, there is the language 
developed by a nonengineering, nonbusiness community of 
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computer scientists and educational psychologists. The terms 
“architecture,” “back propagation,” “memorization,” “valida- 
tion set,” “test set,” and many others mask easily understood 
and familiar concepts from linear regression. The terminol- 
ogy needs to be demystified. 

NNs have been proven successful in modeling and auto- 
mation in a wide range of applications, including robotics 
and chemical processes. This chapter presents the basic con- 
cepts, terminology, and the relatively simple mathematics of 
basic NNs in the hope that the reader will be able to deter- 
mine when and how to use NNs for process modeling within 
automation applications. 

INTRODUCTION 

We often use empirical models in automaton applications, 
which may appear as 

y = a + bx + cx 2 + dx 3 H — (13-2) 

where 

y represents the response, output, or dependent variable 
x represents the cause, input, or independent variable 

Coefficients in the model (13.2) are determined by a least 
squares regression to make the model best match the experi- 
mental data. There are many uses for such equations in 
automation. For example, the equation might be used to 
model reaction yield (y) vs. reactor temperature (x), and the 
inverse of the relation (solving for x given a desired value 
for y) would be used to determine the reactor temperature set 
point that maximizes yield. In another example, the equation 
might represent the installed valve characteristic (flow rate 
as a function of valve stem position) and its inverse would be 
used in automation to determine the controller output given 
a desired flow rate. For this example, the derivative of the 
model, Equation 13.3, describes process gain, and might be 
used to gain schedule the flow rate controller: 

K P = b + 2cx + 3dx 2 H — (13.3) 

Usually a process response is dependent on several variables, 
X], x 2 , x 3 , ..., and the regression model might appear as 

y = a + bx i + cx{ -i 1- dx 2 + ex 2 \- fx\X 2 H — (13.4) 

In the above equations, nonlinearity is expressed in the terms 
with nonunity powers on the independent variables. The 
order of the nonlinearity is the highest power on the input 
variable. 

Note the explosion of the number of coefficients with the 
number of independent variables and the nonlinear flexibility 
(order). For an nth order relation of m variables, Model (13.4) 
will have one intercept (coefficient a), n coefficients for each 


of m variables (b, c, ..., d, e, ...), and coefficients for each 
cross-product combination of variables. For a three-variable, 
third-order model, that means 17 coefficients, and using the 
statistical rule of thumb for regression that there should be 
three times as many data sets as variables, that model would 
need 51 experimental sets. We often compromise and reduce 
the number of terms in the relations, consequently reducing 
model accuracy to reduce the experimental costs. Further 
we often progressively add variables, one at a time, check- 
ing model adequacy at each step, seeking to reduce model 
complexity. 

We also use time-dependent models when it is necessary 
to account for process lags and transport delay. Opening a 
valve starts a mechanism for increasing pressure, and pres- 
sure develops over time. Downstream measurements provide 
a delayed response to what is happening within the process 
unit. To be useful for control, equations must include the 
dynamic response. For a single-input-single-output (SISO) 
process the model might appear as 

y 0 = aiy_i + a 2 y. 2 + a 3 y_ 3 + • ■ ■ + b x x. x + b 2 x. 2 + i> 3 x_ 3 + • • • 

(13.5) 

where the subscripts on the y and x terms represent y and x 
values that many time steps in the past. This is an autoregres- 
sive moving average (ARMA) model of which there are many 
versions. Notable, each past y and x value is acknowledged to 
have an independent influence on the current output value 
y 0 . Also notable each is modeled as having a linear impact 
on the output value y 0 . If a quadratic influence were permit- 
ted, the number of homogeneous terms (hence number of 
coefficients) would double. If cross-product quadratic terms 
were included, then (n-2)(n+ 1)/2 additional terms would be 
present. The explosion of coefficients for modeling even a 
simple quadratic dynamic relation is experimentally imprac- 
tical. And even so, the nonlinearity would be explicitly mod- 
eled as quadratic, if additional functionality were considered 
the number of required coefficients becomes oppressive. So, 
most model-based control is based on linear models. 

NNs can model nonlinear dynamic processes and do 
not require the user to specify the functional form of the 
nonlinearity. 

NEURAL NETWORK CONCEPTS 

Biological Inspiration for Neural Networks 

NNs were inspired by the mechanism of how human neurons 
transmit information. Figure 13.1 represents this elementary 
concept. Five nerve cells are illustrated, not to scale. At one 
end of the cell is the nucleus with the dendrites, which are 
sensitive to stimuli and excite the cell. At the other end of 
the axon are the chemical neurotransmitters. When the cell 
is excited, it sends an electrical signal along the axon, which 
releases the neurotransmitter chemicals. 
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FIG. 13.1 

A network of nerve cells at rest. 



FIG. 13.2 

Step 1 in the excitation of a network of nerve cells. 


Figures 13.2 and 13.3 illustrate the sequence of neuro- 
transmission events when the lower left cell is excited. In 
Figure 13.2, the excited cell releases chemicals in the vicin- 
ity of three other sets of cell dendrites. Note that the illustra- 
tion indicates that the proximity of chemicals to the dendrites 
does not excite the upper left cell, but does excite the two 
central cells. The attenuation (the degree that locally released 
chemicals would excite a cell) depends on proximity of den- 
drites to the transmitter chemicals, amount of chemicals 
released, intercellular material between axon and dendrites, 
etc. In Figure 13.3, the release of chemicals from the two 
excited central cells combine to provide a very strong exci- 
tation signal to the upper-right cell, which would guarantee 
the transmission of a release upstream in the information 


transmission linkage. The eventual signal depends on degree 
of input corroboration and attenuation in each possible trans- 
mission path. 

Fundamentals of Artificial Neural 
Networks and Terminology 

The name artificial neural network (ANN) is often employed 
to describe an electromechanical or digital computational 
device, which is created to mimic the biological neurotrans- 
mission process. More commonly, it is simply termed an NN. 
Figure 13.4 presents one elementary ANN concept. 

This figure represents one artificial neuron with three 
influences (input variables), x h x 2 , and x 3 . The strength of 
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FIG. 13.3 

Step 2 in the excitation of a network of nerve cells. 


w 1 



FIG. 13.4 

Mathematical construct of a nerve cell. 


each signal is attenuated by a weight, w u w 2 , and w 3 accord- 
ing to Equation 13.6: 

Sj = x t Wi ( 13 . 6 ) 

There could be more or fewer inputs to the neuron. The atten- 
uated inputs are summed, and the signal, z, represents the 
collective influence on the dendrites: 

z = Si = x t Wi ( 13 . 7 ) 

If the signal is strong enough, then the neuron is triggered. 
This is indicated by the z to y transfer function within the 
circle. In this illustration, the output value for y is a step 


function, zero until the value for z exceeds a threshold, 
according to Equation Set (13.8): 

y = 0, z<fll ( 13 . 8 ) 

y = 1, z> a \ 

The transfer function is a key element in what a neuron can 
do. With the step function shown, the neuron could classify 
binary choices (yes or no, zero or one, on or off, fault or nor- 
mal, etc.). However, many applications of NNs are intended 
to model continuous response variables, and in this case a 
sigmoidal (s-shaped) transfer function is more useful. The 
transfer function illustrated in Figure 13.5 and Equation 13.9 
is a bipolar sigmoidal function, with the bipolar meaning that 
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FIG. 13.5 

Mathematical construct of a nerve cell (sigmoidal). 


the extreme values for y are -1 and +1. An alternate name is 
hyperbolic tangent sigmoidal: 



e z + e z 


( 13 . 9 ) 


Combining Equations 13.8 and 13.9, the bipolar sigmoidal 
neuron calculation is 


z = 



y = 


e +e 


( 13 . 10 ) 


Individual neurons are assembled in a configuration such as 
that illustrated in Figure 13.6, in which there are two pro- 
cess inputs, Xj and x 2 , and one process response, y. However, 


there is a third input to the ANN, bias, which provides an 
offset functionality as does the constant, a, in Equations 13.1 
through 13.4. The circles on the left side of the figure are 
termed the input layer (as if it were a skin layer sensitive to 
heat or touch), and the right hand neuron, the output layer. 
The middle column of neurons is termed the hidden layer. 
This illustration shows a 2-3-1 network, where the numbers 
represent the number of external inputs or neurons in each 
layer. There should be one neuron in the output layer for each 
modeled output, and one in the input layer for each input 
variable. In general, for response variables that are continu- 
ous, there is only one output per NN. However, in an NN that 
classifies events, there could be one on-off output neuron for 
each classification category. There could be any number of 
neurons in the hidden layer, with a greater number provid- 
ing greater modeling flexibility. Some NNs have more than 
one hidden layer, but often placing the additional neurons 
in the single hidden layer is more efficient. The architecture 



FIG. 13.6 

Artificial neural network. 
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of an NN refers to the number of neurons in each layer, the 
transfer function used, the location of the bias influence, 
and other structural aspects to be discussed. Architecture to 
an NN would be similar to the choice of powers and cross- 
products chosen for a conventional regression model, such as 
Equation 13.4. Each neuron would use Equation Set (13.10) 
to convert its input signal values and weights to its output 
value. 

The NN of Figure 13.6 has 12 weights and a bias nomi- 
nally representing 13 coefficients that are adjusted to make 
the NN-modeled output best match the data. However, the 
bias multiplied by one of the weights on the bias can have 
the same value for an infinite number of combinations. 
Accordingly, a fixed bias value of unity is adequate, leaving 
12 adjustable coefficients. If there are N number of inputs, 
there are (N+ 1) values leaving the input layer. If there are M 
number of neurons in the hidden layer, then there are M(N+ 1) 
number of weights in the hidden layer. With an additional 
M weights in the output neuron, there is a total of M(N+ 2) 
adjustable coefficients to shape the NN output. 

In conventional regression modeling, the procedure 
of finding best coefficient values is called optimization, or 
least squares fitting. However, as an outcome of the NN ori- 
gins, desiring the ANN to learn the data, adjusting weights 
is termed training, or learning. Classic nonlinear optimiza- 
tion algorithms for such use would be Levenberg-Marquardt, 
Hook-Jeeves, generalized reduced-gradient, etc. However, 
back propagation was devised for NNs in one step at a time, 
incremental optimization, of sequentially occurring data, 
and remains a popular method of “training.” However, in 
author's experience, traditional optimizers are much faster. 
Conventionally, each optimization stage is termed an itera- 
tion, but within the back propagation community, each com- 
plete cycle of the optimizer is called an epoch. 

NN language continues to follow educational language 
and concepts. A teacher presents information to the students 
who are to learn it. Consider teaching of ethics. We don’t 
want students to memorize data. Instead we want them to 
generalize underlying trends and behaviors that the data gen- 
erally describes, so that they can make ethical decisions in 
new situations. We could validate student learning of the gen- 
eral principles, as opposed to them memorizing case studies, 
by giving occasional quizzes on data not presented in class. If 
students had memorized in-class material, then they cannot 
do the new problems. However, if they have understood the 
general principles, then they will perform well on the new 
data. For NN training, this means you reserve a fraction (per- 
haps 10%) of data points as a validation set, and these are not 
used in the training. After each epoch (optimizer iteration), 
observe the NN ability to match the validation set. If learn- 
ing is improving at each epoch, as measured by a reduction 
in sum-of-squared residuals in the validation set, continue 
training. When there is no more improvement in the perfor- 
mance on the validation set, stop training. (If optimization 
iterations are reducing the sum of squared errors, then con- 
tinue optimization, else stop.) 


Then finally, the teacher tests the students, to formally 
see if they are ready to be released upon society. The test and 
quiz problems do not duplicate examples presented in class; 
they represent independent tests of student proficiency. As a 
parallel for NN “training,” collect 1000 sets of data, withhold 
a random 200 for the test, and a random 100 for validation. 
Train the NN on 700 sets of data, stop the learning when the 
validation set performance stops improving, and then test. 

This three-stage approach is in distinct contrast to con- 
ventional nonlinear regression practice, in which all 1000 
sets of data are used for coefficient adjustment, stopping is 
based on inconsequential improvement for the same data, 
and the “is it a good enough model?” evaluation is also based 
on all of the same data. In many NN applications, users are 
rejecting the legacy validation/testing methods and returning 
to conventional optimizers and methods for NN “training,” 
in which “learning” is based on all of the expensive data 
collected, not just 70%. However, many use the leaning- 
validation-test approach with success. 

Since the output of a neuron is bounded, the output of the 
NN is bounded. In the transfer function of Equation 13.9, the 
output is bounded between -1 and +1. If the process data have 
a different scale, then have the NN output model a scaled 
variable. Use Equation 13.11 to scale the output process vari- 
able from -0.8 to +0.8 so that (1) the “z” signal internal to the 
NN does not have to go to ±infinity, and (2) it can cope with 
new data that might extend the range: 

pv _ py„ 

y = -0.8+ 1.6 Mm (13.11) 

PV Ma x-PV M in 

Input data should also be scaled. Consider the bipolar sigmoi- 
dal transfer function, z-values outside of the ±2 range lead to 
saturation of the transfer function output ( y-values nearly ±1, 
which are then insensitive to variation in the z-value). Large 
values for the weights on the bias, or small values for weights 
on the input signals could relocate the summed z-value to be 
within the -2 to +2 range to create a nonsaturated neuron 
response, but scaling input variables to a -1 to +1 basis accel- 
erates the network optimization (training) by effectively lim- 
iting the weight values within a -2 to +2 range. Use Equation 
13.12 to scale the input process variable: 

pv _ pv,. 

x = —l + 2 Mm (13.12) 

PVMax-PVMin 

Alternately, use Equation 13.11 for input scaling as a one- 
method approach to prevent confounding a user. 

Common NN Options 

Two transfer functions (step after a threshold and bipolar sig- 
moidal) were revealed in Equations 13.8 and 13.9. However, 
there are many others including similar ones that are shifted 
upward so that the output range is in the 0-1 range. In such 
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cases, the scaling of the output variable needs to be adjusted 
to match the neuron output range. The transfer functions may 
also be translated to the left or right by adjusting the input 
thresholds to fire a neuron. 

There are transfer functions that are similar to the sig- 
moidal, such as a linear ramp between saturation limits: 


indicated by MA(2). If higher order MA models are desired, 
place additional input nodes in the input layer. 

For a process that is modeled as autoregressive with 
one past output value influencing the present value, AR(1), 
and one past external input, MA(1), the model is called 
ARMA(1,1): 


y=-h 

z<-a 

l ! (z + a) 
(2a) ’ 

Q 

VI 

VI 

a 

l 

y = +l 

Z> a 


(13.13) 


And, there are those that are similar to the step function, but 
which might have several discrete steps: 


y = o, 
y = 0.33, 
' y = 0.67, 

. y = i. 


Z<a 

a< z<b 
b< z< c 
z > c 


(13.14) 


In general, for applications that have continuous response 
values, use those with continuous output values; and for clas- 
sification use those with discrete output values. 

The NN of Figure 13.6 would represent a steady-state or 
process analysis relation such as one described in Equation 
13.4. However, dynamic models use past as well as present 
values, as illustrated in Equation 13.5, representing dynam- 
ics in a linear autoregressive moving-average relation. For a 
process that is modeled as moving average: 

V; = a + biXi-i + b 2 Xi- 2 (13.15) 

The NN architecture would use the past x-value and second 
past x-value as inputs, as shown in Figure 13.7. With two past 
inputs in the model, this is a moving average (MA) of order 2, 


>i = a i >«■- 1 + biXi-i (13.16) 


The NN architecture would use the past x-value and past 
y-value as inputs, as shown in Figure 13.8, in which the “D” 
block represents a simple one-step delay. If it is desired to 
represent higher order AR functionality, then add sequential 
delay blocks to obtain the series of delayed y-values, and add 
nodes in the input layer to receive each of them. 

Figure 13.8 illustrates a recurrent network, which means 
that an output value is fed back into the NN. The user might 
choose to place the feedback as an input to a neuron in the 
hidden layer, or take the output from a hidden layer neuron 
and feed it back to the input layer. In contrast to a NN with an 
internal feedback signal (such as illustrated in Figure 13.8), 
when all signals propagate from the input layer toward the 
output (such as illustrated in Figures 13.6 and 13.7), the NN 
is termed feedforward. 

Examples so far have not shown a transfer function 
operation in the input node. Generally, this is only needed in 
the hidden and output layer. However, radial basis networks 
place a Gaussian-like function in the input layer, as presented 
in Equation 13.17. The function is defined by a centre, c, and 
a width, vv. When the value of x is close to the center, rela- 
tive to width, the output value of the input node is nearly 
unity. When x is either above or below the center, the output 
value drops to nearly zero. In this method, the input only 
propagates a value to the hidden layer neurons when the input 
value is near to some target: 

y = e _((l-c)/,v>2 (13.17) 



FIG. 13.7 

Artificial neural network for a MA model. 
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FIG. 13.8 

ANN for an autoregressive model. 


As a final option mentioned here, the NN model can be used 
to capture the inverse mechanism. If steady-state data was 
normally obtained by changing the input to a process and 
measuring the response, then the data base columns for input 
and output can be switched. As a simplistic example in the 
process control applications, response variables in a distil- 
lation column are top and bottom product compositions, c T 
and c B , and influence variables include the two manipulated 
variables reflux flow rate, L, and boiler power, P. Influence 
variables also include several process and environmental 
conditions including feed flow rate, F, and feed composition, 
c F . Normally, a rigorous model calculates the steady-state 
response as a function of the influences: 

{c x ,c B } = Model (L,P,F,c f ) (13.18) 


best of the N optimizations is used as the NN weight values. 
Often the best of 5-10 trainings from independent initial ran- 
dom weight values is reported. Iyer and Rhinehart [12] provide 
Equation 13.19 for determining the number of independent 
trainings, N, that is based on the user’s desired confidence, c, 
that one of the best possible fraction of all outcomes,/ is found: 


N = }nO z cl 
ln(l — /) 


( 13 . 19 ) 


For example, to be 95% confident (c = 0.95) that after N 
number of independent optimizations that the best of N will 
converge to one of the best possible 10% (/=0.10) of all pos- 
sible outcomes. Equation 13.20 indicates that there should be 
N= 28 (rounded) independent trainings: 


However, once the data sets {L, P, F, c F , c x , c B ] are calculated, 
the columns can be switched { F , c F , c T , c B , L, P] and an NN 
can be trained to calculate values for L and P, given values for 
F, c F , c T , and c B . Then, given values for environmental inputs, 
F and c F , and desired values (set points) for the process out- 
puts, c x and c B , the NN can calculate the steady-state values 
of manipulated variables. Since the NN output is bounded, 
the inverse of the control problem is bounded. Since the NN is 
a deterministic sequence of feedforward calculations a solu- 
tion is guaranteed, and the computational time is guaranteed. 
There are many control structures that are based on nonlinear 
steady-state models and linear dynamics. Ramchandran [10] 
and Ramchandran and Rhinehart [11] used this technique for 
control of commercial-scale and lab-scale binary distillation. 

NNs are used to model nonlinear processes, which present 
a common problem for nearly any model optimization. There 
may be local minima on the “surface,” which may trap an opti- 
mization at a good, but not the best, set of weights. Normally, 
NNs are “trained” multiple times from randomized initial 
weight values within the starting range of -1 to +1, and the 


ln(l-0.95) 

ln(l-0.10) 


28.4331... 


( 13 . 20 ) 


EXAMPLE 

Figure 13.9 shows a SISO process response to a sawtooth pat- 
tern in the process input. The continuous line is the controller 
output, the influence variable, with values indicated on the 
right-hand vertical axis. The measured process response is 
the dots at each sampling, which correspond to the left-hand 
vertical axis. The process response is generated by a simula- 
tor, which is nonlinear, fourth-order, noisy, and affected by 
an external disturbance with persistence driven by random 
noise. The delay in the simulator is 3 s (which converts to 15 
simulation time steps), and can be best seen in the middle of 
Figure 13.9 when the ramp up in input takes an immediate 
drop down. The drop in process response is delayed. 

The modeling objective will be to predict the fifth future 
data point from a current and past model output (y) and a 
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FIG. 13.9 

Example of a dynamic SISO nonlinear process response. 

past model input ( u ). This is an ARMA(2,1) type of model, as 
illustrated in the following equation: 

y i+ 5 = ayi + by,_ N + cui_ M (13.21) 

But, the NN will permit it to be a nonlinear model. Further, 
the model will use the delay of M samplings in the influence, 
u, and N samplings in the past y value. 

Mistakes in Regression Modeling 
and Parallels in the Use of NNs 

Regardless of the regression modeling approach, conven- 
tional or NNs, there are some do’s and dont’s in modeling. 

Do select input variables that best match underlying mech- 
anisms. Often, this selection requires an exploration of corre- 
lation between variables. For the example process in Figure 
13.9, the delay is 15 time increments, but with the higher order 
process, the smaller time constants make an effective delay of 
about 20 time steps. Accordingly, M should be about 20 in the 
Equation 13.21 model. Whether using an NN or conventional 
models, the user must select the right set of inputs. 

A second “Do” is to select the right model functional- 
ity (architecture). Too simple a model will be inadequate to 
capture the mechanisms. One may start with an ARMA(1,1) 
model, then explore ARMA(1,2), ARMA(1,3), 

ARMA(2,1), ARMA(3,1), ..., ARMA(2,2), ... structures. 
One might also start with three neurons in the hidden layer, 
then try 4, 5, ... until getting a “right” model. A right model 
provides adequate modeling goodness (perhaps as indicated 
by sum-of-squared-deviations (SSD) between model and 
measurement) with the least complexity (with fewest num- 
bers of inputs and model parameters). Whether using an 
NN or conventional models, the user must select the right 
architecture. 

Figure 13.10 shows the ability of a best conventional (lin- 
ear) second-order-plus-deadtime (SOPDT) model to match 
the data. The continuous line represents the SOPDT model 
output. Note the mismatch at either high or low process val- 
ues. The optimizer decided that a delay of 13 samples was best 
for capturing the u impact. The SSD value is 176 (v-units 2 ). 



FIG. 13.10 

Comparison of SOPDT model to process response. 

By contrast, Figure 13.11 shows the ability of NN to pre- 
dict gain nonlinearity in the data. The NN uses a w-delay of 
20 samples, a y-delay of six samples, and has four neurons in 
the hidden layer, and the SSD is 20.5 (y-u nits 2 ), more than an 
eightfold reduction. 

The model and process data are nearly identical and 
Figure 13.12 provides a better representation of the goodness 
of model fit. 

Do gather enough data to be able to confidently generate a 
model. Be sure that there are about three times as many data 
sets as adjustable coefficients, and that the data expresses the 
entire range. 

Do decide if the model is sufficient for function. If a model 
is used in continuous control, then feedback at each sampling 
fine tunes past action, and an 80% right model may be fully 
adequate. If 80% right in control action, it is 20% wrong. If 
the 20% is corrected at each sampling, then, roughly, five 
sequential control actions will leave a 0.20 s = 0.03% residual 
error. Alternately, if the model represents a one-shot deci- 
sion, then it may need to be 99.999% correct. The utility of a 
model is not dependent on whether it is an NN or a classical 
power series regression. Utility for use is what matters, not 
r-square, not SSD. 

Comparison of NNs to Conventional 
Regression Modeling 

There are advantages and disadvantages of the use of NNs 
compares to conventional regression approaches: 

• NNs model nonlinear processes without the user need 
to define functional relationships. In a sense they are 
model free. 

• NNs compute rapidly compared to first-principle 
models, which makes them advantageous for real-time 
automation applications. 

• Like any regression model, creator choices (of input 
variables, architecture, and selection of training data 
set) control NN validity. 
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FIG. 13.11 

Comparison ofNN prediction to process response. 



FIG. 13.12 

Comparison ofNN model to process response. 

• If the process changes, a new model is needed. 

• There is a learning curve associated with develop- 
ing skills and experience to make the right choices in 
using any regression technique. 

• There remains an acceptance barrier to NNs due to 
anthropomorphic terminology and persistence of folk- 
lore. Hopefully this chapter debunked the erroneous 
viewpoints. 


NN APPLICATIONS IN AUTOMATION 

References cited at the end of this chapter note some indus- 
trial applications reported in the scientific journal literature. 
Applications include sintering, flocculation, grinding, distil- 
lation, measurement calibration, valve fault diagnosis, and 
distillation fault diagnosis, which reveal the wide applica- 
bility for NNs within industry [8,13-19]. Among thousands, 


some further examples of applications of NN are also listed 
in the references [20-33]. 


CONCLUSIONS AND COMMENTS 

NNs are a useful tool in process control and automation, and a 
multitude of applications have demonstrated successes. They 
are deterministic with bounded outputs. They are computa- 
tionally faster than first-principle models and more flexible 
than conventional regression models. If you are considering 
nonlinear regression as a tool, consider NNs. However, as 
with any modeling tool, the user needs to understand how it 
works, and how to use it properly. 
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